Binary Deep Neural Networks for Speech Recognition
نویسندگان
چکیده
Deep neural networks (DNNs) are widely used in most current automatic speech recognition (ASR) systems. To guarantee good recognition performance, DNNs usually require significant computational resources, which limits their application to low-power devices. Thus, it is appealing to reduce the computational cost while keeping the accuracy. In this work, in light of the success in image recognition, binary DNNs are utilized in speech recognition, which can achieve competitive performance and substantial speed up. To our knowledge, this is the first time that binary DNNs have been used in speech recognition. For binary DNNs, network weights and activations are constrained to be binary values, which enables faster matrix multiplication based on bit operations. By exploiting the hardware population count instructions, the proposed binary matrix multiplication can achieve 5 ∼ 7 times speed up compared with highly optimized floating-point matrix multiplication. This results in much faster DNN inference since matrix multiplication is the most computationally expensive operation. Experiments on both TIMIT phone recognition and a 50-hour Switchboard speech recognition show that, binary DNNs can run about 4 times faster than standard DNNs during inference, with roughly 10.0% relative accuracy reduction.
منابع مشابه
شبکه عصبی پیچشی با پنجرههای قابل تطبیق برای بازشناسی گفتار
Although, speech recognition systems are widely used and their accuracies are continuously increased, there is a considerable performance gap between their accuracies and human recognition ability. This is partially due to high speaker variations in speech signal. Deep neural networks are among the best tools for acoustic modeling. Recently, using hybrid deep neural network and hidden Markov mo...
متن کاملSpeech Emotion Recognition Using Scalogram Based Deep Structure
Speech Emotion Recognition (SER) is an important part of speech-based Human-Computer Interface (HCI) applications. Previous SER methods rely on the extraction of features and training an appropriate classifier. However, most of those features can be affected by emotionally irrelevant factors such as gender, speaking styles and environment. Here, an SER method has been proposed based on a concat...
متن کاملمعرفی شبکه های عصبی پیمانه ای عمیق با ساختار فضایی-زمانی دوگانه جهت بهبود بازشناسی گفتار پیوسته فارسی
In this article, growable deep modular neural networks for continuous speech recognition are introduced. These networks can be grown to implement the spatio-temporal information of the frame sequences at their input layer as well as their labels at the output layer at the same time. The trained neural network with such double spatio-temporal association structure can learn the phonetic sequence...
متن کاملPersian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods
Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...
متن کاملIsolated Word Speech Recognition System Using Deep Neural Networks
Speech recognition is the process of converting speech signals into words. For acoustic modeling HMM-GMM is used for many years. For GMM, it requires assumptions near the data distribution for calculating probabilities. For removing this limitation, GMM is replaced by DNN in acoustic model. Deep neural networks are the feed forward neural networks having more than one or multiple layers of hidd...
متن کامل